{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://github.com/retkowsky/visual-search-azureAI/blob/main/logo.jpg?raw=true\">\n",
    "\n",
    "# Fashion Visual Search demo\n",
    "\n",
    "## 7. Visual search - Index management\n",
    "\n",
    "This code demonstrates how to use **Azure Cognitive Search** with **Cognitive Services Florence Vision API** and Azure Python SDK for visual search.\n",
    "\n",
    "## Visual search with vector embeddings\n",
    "Vector embeddings are a way of representing content such as text or images as vectors of real numbers in a high-dimensional space. These embeddings are often learned from large amounts of textual and visual data using machine learning algorithms like neural networks. Each dimension of the vector corresponds to a different feature or attribute of the content, such as its semantic meaning, syntactic role, or context in which it commonly appears. By representing content as vectors, we can perform mathematical operations on them to compare their similarity or use them as inputs to machine learning models.\n",
    "\n",
    "## Process\n",
    "<img src=\"https://raw.githubusercontent.com/retkowsky/Azure-Computer-Vision-in-a-day-workshop/72c07afc4fcc04a29ca19b84d3d343a09a22368e//fashionprocess.png\" width=512>\n",
    "\n",
    "## Business applications\n",
    "- Digital asset management: Image retrieval can be used to manage large collections of digital images, such as in museums, archives, or online galleries. Users can search for images based on visual features and retrieve the images that match their criteria.\n",
    "- Medical image retrieval: Image retrieval can be used in medical imaging to search for images based on their diagnostic features or disease patterns. This can help doctors or researchers to identify similar cases or track disease progression.\n",
    "- Security and surveillance: Image retrieval can be used in security and surveillance systems to search for images based on specific features or patterns, such as in, people & object tracking, or threat detection.\n",
    "- Forensic image retrieval: Image retrieval can be used in forensic investigations to search for images based on their visual content or metadata, such as in cases of cyber-crime.\n",
    "- E-commerce: Image retrieval can be used in online shopping applications to search for similar products based on their features or descriptions or provide recommendations based on previous purchases.\n",
    "- Fashion and design: Image retrieval can be used in fashion and design to search for images based on their visual features, such as color, pattern, or texture. This can help designers or retailers to identify similar products or trends.\n",
    "\n",
    "## To learn more\n",
    "- https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-image-retrieval\n",
    "- https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search\n",
    "\n",
    "In this notebook we took some samples fashion images are taken from this link:<br>\n",
    "https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset\n",
    "\n",
    "> Note: Image retrieval is curently in public preview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Python librairies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "import io\n",
    "import json\n",
    "import glob\n",
    "import math\n",
    "import os\n",
    "import requests\n",
    "import sys\n",
    "import time\n",
    "\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from azure.search.documents.indexes import SearchIndexerClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    PrioritizedFields,\n",
    "    SearchableField,\n",
    "    SearchField,\n",
    "    SearchFieldDataType,\n",
    "    SearchIndex,\n",
    "    SearchIndexerDataContainer,\n",
    "    SearchIndexerDataSourceConnection,\n",
    "    SemanticConfiguration,\n",
    "    SemanticField,\n",
    "    SemanticSettings,\n",
    "    SimpleField,\n",
    "    VectorSearch,\n",
    "    VectorSearchAlgorithmConfiguration,\n",
    ")\n",
    "from azure.storage.blob import BlobServiceClient\n",
    "from dotenv import load_dotenv\n",
    "from io import BytesIO\n",
    "from PIL import Image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'3.8.5 (default, Sep  4 2020, 07:30:14) \\n[GCC 7.3.0]'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sys.version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Today is 2023-10-18 13:39:42.340307\n"
     ]
    }
   ],
   "source": [
    "print(\"Today is\", datetime.datetime.today())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Azure AI Services"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "load_dotenv(\"azure.env\")\n",
    "\n",
    "# Azure Computer Vision 4\n",
    "acv_key = os.getenv(\"acv_key\")\n",
    "acv_endpoint = os.getenv(\"acv_endpoint\")\n",
    "\n",
    "# Azure Cognitive Search\n",
    "acs_endpoint = os.getenv(\"acs_endpoint\")\n",
    "acs_key = os.getenv(\"acs_key\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ensure that the azure endpoints should not finished a /\n",
    "if acv_endpoint.endswith(\"/\"):\n",
    "    acv_endpoint = acv_endpoint[:-1]\n",
    "\n",
    "if acs_endpoint.endswith(\"/\"):\n",
    "    acs_endpoint = acv_endpoint[:-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Azure Cognitive Search index name to create\n",
    "index_name = \"azure-fashion-demo\"\n",
    "\n",
    "# Azure Cognitive Search api version\n",
    "api_version = \"2023-02-01-preview\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = requests.Session()\n",
    "\n",
    "\n",
    "def image_embedding(imagefile):\n",
    "    \"\"\"\n",
    "    Image embedding using Azure Computer Vision 4.0\n",
    "    \"\"\"\n",
    "    version = \"?api-version=\" + api_version + \"&modelVersion=latest\"\n",
    "    vec_img_url = acv_endpoint + \"/computervision/retrieval:vectorizeImage\" + version\n",
    "    headers = {\n",
    "        \"Content-type\": \"application/octet-stream\",\n",
    "        \"Ocp-Apim-Subscription-Key\": acv_key,\n",
    "    }\n",
    "\n",
    "    try:\n",
    "        blob_service_client = BlobServiceClient.from_connection_string(\n",
    "            blob_connection_string\n",
    "        )\n",
    "        container_client = blob_service_client.get_container_client(container_name)\n",
    "\n",
    "        blob_client = container_client.get_blob_client(imagefile)\n",
    "        stream = BytesIO()\n",
    "        blob_data = blob_client.download_blob()\n",
    "        blob_data.readinto(stream)\n",
    "\n",
    "        stream.seek(0)  # Reset stream position to the beginning\n",
    "\n",
    "        response = session.post(vec_img_url, data=stream, headers=headers)\n",
    "        response.raise_for_status()  # Raise an exception if response is not 200\n",
    "\n",
    "        image_emb = response.json()[\"vector\"]\n",
    "        return image_emb\n",
    "\n",
    "    except requests.exceptions.RequestException as e:\n",
    "        print(f\"Request Exception: {e}\")\n",
    "    except Exception as ex:\n",
    "        print(f\"Error: {ex}\")\n",
    "\n",
    "    return None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "def index_status(index_name):\n",
    "    \"\"\"\n",
    "    Azure Cognitive Search index status\n",
    "    \"\"\"\n",
    "    print(\"Azure Cognitive Search Index:\", index_name, \"\\n\")\n",
    "\n",
    "    headers = {\"Content-Type\": \"application/json\", \"api-key\": acs_key}\n",
    "    params = {\"api-version\": \"2021-04-30-Preview\"}\n",
    "    index_status = requests.get(\n",
    "        acs_endpoint + \"/indexes/\" + index_name, headers=headers, params=params\n",
    "    )\n",
    "    try:\n",
    "        print(json.dumps((index_status.json()), indent=5))\n",
    "    except:\n",
    "        print(\"Request failed\")\n",
    "\n",
    "\n",
    "def index_stats(index_name):\n",
    "    \"\"\"\n",
    "    Get statistics about Azure Cognitive Search index\n",
    "    \"\"\"\n",
    "    url = (\n",
    "        acs_endpoint\n",
    "        + \"/indexes/\"\n",
    "        + index_name\n",
    "        + \"/stats?api-version=2021-04-30-Preview\"\n",
    "    )\n",
    "    headers = {\n",
    "        \"Content-Type\": \"application/json\",\n",
    "        \"api-key\": acs_key,\n",
    "    }\n",
    "    response = requests.get(url, headers=headers)\n",
    "    print(\"Azure Cognitive Search index status for:\", index_name, \"\\n\")\n",
    "\n",
    "    if response.status_code == 200:\n",
    "        res = response.json()\n",
    "        print(json.dumps(res, indent=2))\n",
    "        document_count = res[\"documentCount\"]\n",
    "        storage_size = res[\"storageSize\"]\n",
    "\n",
    "    else:\n",
    "        print(\"Request failed with status code:\", response.status_code)\n",
    "\n",
    "    return document_count, storage_size\n",
    "\n",
    "\n",
    "def acs_service_statistics(index_name):\n",
    "    \"\"\"\n",
    "    Azure Cognitive Search service statistics\n",
    "    \"\"\"\n",
    "    url = os.getenv(\"acs_endpoint\") + \"/servicestats?api-version=2021-04-30-Preview\"\n",
    "    headers = {\n",
    "        \"Content-Type\": \"application/json\",\n",
    "        \"api-key\": os.getenv(\"acs_key\"),\n",
    "    }\n",
    "    response = requests.get(url, headers=headers)\n",
    "    print(\"Azure Cognitive Search index status for:\", index_name, \"\\n\")\n",
    "\n",
    "    if response.status_code == 200:\n",
    "        res = response.json()\n",
    "        print(json.dumps(res, indent=2))\n",
    "\n",
    "    else:\n",
    "        print(\"Request failed with status code:\", response.status_code)\n",
    "        \n",
    "        \n",
    "def delete_acs_document_by_id(document_id):\n",
    "    \"\"\"\n",
    "    Delete Azure Cognitive Search document\n",
    "    \"\"\"\n",
    "    # Search client\n",
    "    search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key))\n",
    "\n",
    "    # Delete document by its id number\n",
    "    print(f\"Document with id {document_id} will be deleted from index {index_name}\")\n",
    "    document_to_delete = {\"idfile\": document_id}  # Delete the entry based on the idfile\n",
    "    search_client.delete_documents([document_to_delete])\n",
    "\n",
    "    print(\"Done\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Azure Cognitive search index informations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search index status for: azure-fashion-demo \n",
      "\n",
      "{\n",
      "  \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.ServiceStatistics\",\n",
      "  \"counters\": {\n",
      "    \"documentCount\": {\n",
      "      \"usage\": 33242,\n",
      "      \"quota\": null\n",
      "    },\n",
      "    \"indexesCount\": {\n",
      "      \"usage\": 21,\n",
      "      \"quota\": 50\n",
      "    },\n",
      "    \"indexersCount\": {\n",
      "      \"usage\": 1,\n",
      "      \"quota\": 50\n",
      "    },\n",
      "    \"dataSourcesCount\": {\n",
      "      \"usage\": 3,\n",
      "      \"quota\": 50\n",
      "    },\n",
      "    \"storageSize\": {\n",
      "      \"usage\": 1026951396,\n",
      "      \"quota\": 26843545600\n",
      "    },\n",
      "    \"synonymMaps\": {\n",
      "      \"usage\": 0,\n",
      "      \"quota\": 5\n",
      "    },\n",
      "    \"skillsetCount\": {\n",
      "      \"usage\": 0,\n",
      "      \"quota\": 50\n",
      "    },\n",
      "    \"aliasesCount\": {\n",
      "      \"usage\": 0,\n",
      "      \"quota\": 100\n",
      "    }\n",
      "  },\n",
      "  \"limits\": {\n",
      "    \"maxFieldsPerIndex\": 3000,\n",
      "    \"maxFieldNestingDepthPerIndex\": 10,\n",
      "    \"maxComplexCollectionFieldsPerIndex\": 40,\n",
      "    \"maxComplexObjectsInCollectionsPerDocument\": 3000\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "acs_service_statistics(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search Index: azure-fashion-demo \n",
      "\n",
      "{\n",
      "     \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#indexes/$entity\",\n",
      "     \"@odata.etag\": \"\\\"0x8DB79721D17ED54\\\"\",\n",
      "     \"name\": \"azure-fashion-demo\",\n",
      "     \"defaultScoringProfile\": null,\n",
      "     \"fields\": [\n",
      "          {\n",
      "               \"name\": \"idfile\",\n",
      "               \"type\": \"Edm.String\",\n",
      "               \"searchable\": false,\n",
      "               \"filterable\": false,\n",
      "               \"retrievable\": true,\n",
      "               \"sortable\": false,\n",
      "               \"facetable\": false,\n",
      "               \"key\": true,\n",
      "               \"indexAnalyzer\": null,\n",
      "               \"searchAnalyzer\": null,\n",
      "               \"analyzer\": null,\n",
      "               \"normalizer\": null,\n",
      "               \"synonymMaps\": []\n",
      "          },\n",
      "          {\n",
      "               \"name\": \"imagefile\",\n",
      "               \"type\": \"Edm.String\",\n",
      "               \"searchable\": true,\n",
      "               \"filterable\": false,\n",
      "               \"retrievable\": true,\n",
      "               \"sortable\": false,\n",
      "               \"facetable\": false,\n",
      "               \"key\": false,\n",
      "               \"indexAnalyzer\": null,\n",
      "               \"searchAnalyzer\": null,\n",
      "               \"analyzer\": null,\n",
      "               \"normalizer\": null,\n",
      "               \"synonymMaps\": []\n",
      "          },\n",
      "          {\n",
      "               \"name\": \"imagevector\",\n",
      "               \"type\": \"Collection(Edm.Single)\",\n",
      "               \"searchable\": true,\n",
      "               \"filterable\": false,\n",
      "               \"retrievable\": true,\n",
      "               \"sortable\": false,\n",
      "               \"facetable\": false,\n",
      "               \"key\": false,\n",
      "               \"indexAnalyzer\": null,\n",
      "               \"searchAnalyzer\": null,\n",
      "               \"analyzer\": null,\n",
      "               \"normalizer\": null,\n",
      "               \"synonymMaps\": []\n",
      "          }\n",
      "     ],\n",
      "     \"scoringProfiles\": [],\n",
      "     \"corsOptions\": null,\n",
      "     \"suggesters\": [],\n",
      "     \"analyzers\": [],\n",
      "     \"normalizers\": [],\n",
      "     \"tokenizers\": [],\n",
      "     \"tokenFilters\": [],\n",
      "     \"charFilters\": [],\n",
      "     \"encryptionKey\": null,\n",
      "     \"similarity\": {\n",
      "          \"@odata.type\": \"#Microsoft.Azure.Search.BM25Similarity\",\n",
      "          \"k1\": null,\n",
      "          \"b\": null\n",
      "     },\n",
      "     \"semantic\": {\n",
      "          \"defaultConfiguration\": null,\n",
      "          \"configurations\": [\n",
      "               {\n",
      "                    \"name\": \"my-semantic-config\",\n",
      "                    \"prioritizedFields\": {\n",
      "                         \"titleField\": {\n",
      "                              \"fieldName\": \"idfile\"\n",
      "                         },\n",
      "                         \"prioritizedContentFields\": [],\n",
      "                         \"prioritizedKeywordsFields\": []\n",
      "                    }\n",
      "               }\n",
      "          ]\n",
      "     }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "index_status(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search index status for: azure-fashion-demo \n",
      "\n",
      "{\n",
      "  \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics\",\n",
      "  \"documentCount\": 10219,\n",
      "  \"storageSize\": 155597306\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "document_count, storage_size = index_stats(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of documents in the index = 10,219\n",
      "Size of the index = 148.39 MB\n"
     ]
    }
   ],
   "source": [
    "print(\"Number of documents in the index =\", f\"{document_count:,}\")\n",
    "print(\"Size of the index =\", round(storage_size / (1024 * 1024), 2), \"MB\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Quick search on a document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Id file: 3062\n",
      "Filename: fashion/0627214001.jpg\n"
     ]
    }
   ],
   "source": [
    "text = \"0627214001\"\n",
    "response = search_client.search(search_text=text)\n",
    "\n",
    "for result in response:\n",
    "    print(\"Id file:\", result[\"idfile\"])\n",
    "    print(\"Filename:\", result[\"imagefile\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Let's query the index with text = 0628469001 \n",
      "\n",
      "Id file: 3267\n",
      "Filename: fashion/0628469001.jpg\n"
     ]
    }
   ],
   "source": [
    "text = \"0628469001\"\n",
    "print(\"Let's query the index with text =\", text, \"\\n\")\n",
    "\n",
    "response = search_client.search(search_text=text)\n",
    "\n",
    "for result in response:\n",
    "    print(\"Id file:\", result[\"idfile\"])\n",
    "    print(\"Filename:\", result[\"imagefile\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Let's delete some documents from the index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search index status for: azure-fashion-demo \n",
      "\n",
      "{\n",
      "  \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics\",\n",
      "  \"documentCount\": 10221,\n",
      "  \"storageSize\": 155633480\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "document_count, storage_size = index_stats(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document with id 3062 will be deleted from index azure-fashion-demo\n",
      "Done\n"
     ]
    }
   ],
   "source": [
    "document_id = \"3062\"\n",
    "delete_acs_document_by_id(document_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search index status for: azure-fashion-demo \n",
      "\n",
      "{\n",
      "  \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics\",\n",
      "  \"documentCount\": 10220,\n",
      "  \"storageSize\": 155618254\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "document_count, storage_size = index_stats(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document with id 3267 will be deleted from index azure-fashion-demo\n",
      "Done\n"
     ]
    }
   ],
   "source": [
    "document_id = \"3267\"\n",
    "delete_acs_document_by_id(document_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search index status for: azure-fashion-demo \n",
      "\n",
      "{\n",
      "  \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics\",\n",
      "  \"documentCount\": 10219,\n",
      "  \"storageSize\": 155603130\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "document_count, storage_size = index_stats(index_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> We have deleted 2 documents from the index as we can see"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Let's add some new images to the index\n",
    "\n",
    "Let's use a new blob to use that contains some new images to add"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Azure storage account\n",
    "blob_connection_string = \"tobereplaced\"\n",
    "container_name = \"tobereplaced\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "URL of the first blob: https://azurestorageaccountsr.blob.core.windows.net/fashion-images-new/shirt1.jpg\n"
     ]
    }
   ],
   "source": [
    "# Connect to Blob Storage\n",
    "blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string)\n",
    "container_client = blob_service_client.get_container_client(container_name)\n",
    "blobs = container_client.list_blobs()\n",
    "\n",
    "first_blob = next(blobs)\n",
    "blob_url = container_client.get_blob_client(first_blob).url\n",
    "print(f\"URL of the first blob: {blob_url}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done. Data source 'azure-fashion-demo-blob' has been created or updated.\n"
     ]
    }
   ],
   "source": [
    "# Create a data source\n",
    "ds_client = SearchIndexerClient(acs_endpoint, AzureKeyCredential(acs_key))\n",
    "container = SearchIndexerDataContainer(name=container_name)\n",
    "data_source_connection = SearchIndexerDataSourceConnection(\n",
    "    name=f\"{index_name}-blob\",\n",
    "    type=\"azureblob\",\n",
    "    connection_string=blob_connection_string,\n",
    "    container=container,\n",
    ")\n",
    "data_source = ds_client.create_or_update_data_source_connection(data_source_connection)\n",
    "\n",
    "print(f\"Done. Data source '{data_source.name}' has been created or updated.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of images = 2 in blob: fashion-images-new\n"
     ]
    }
   ],
   "source": [
    "blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string)\n",
    "container_client = blob_service_client.get_container_client(container_name)\n",
    "number_images = len(list(container_client.list_blobs()))\n",
    "\n",
    "print(\"Total number of images =\", number_images, \"in blob:\", container_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "EMBEDDINGS_DIR = \"embeddings\"\n",
    "\n",
    "os.makedirs(EMBEDDINGS_DIR, exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "list_of_images = container_client.list_blobs()\n",
    "\n",
    "new_images_list = []\n",
    "\n",
    "for image in list_of_images:\n",
    "    imagefile = image[\"name\"]\n",
    "    new_images_list.append(imagefile)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['shirt1.jpg', 'shirt2.jpg']"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_images_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-rwxrwxrwx 1 root root 98 Oct 18 13:55 embeddings/list_of_images_new.json\r\n"
     ]
    }
   ],
   "source": [
    "startindex = 100003\n",
    "\n",
    "data = [\n",
    "    {\"idfile\": str(i + startindex +1), \"imagefile\": image} for i, image in enumerate(new_images_list)\n",
    "]\n",
    "\n",
    "with open(os.path.join(EMBEDDINGS_DIR, \"list_of_images_new.json\"), \"w\") as f:\n",
    "    json.dump(data, f)\n",
    "    \n",
    "!ls $EMBEDDINGS_DIR/list_of_images_new.json -lh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'idfile': '100004', 'imagefile': 'shirt1.jpg'},\n",
       " {'idfile': '100005', 'imagefile': 'shirt2.jpg'}]"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running the image files embeddings...\n",
      "Total number of images to embed = 2 \n",
      "\n",
      "\n",
      "Done\n",
      "\n",
      "Elapsed time: 00:00:01.166330\n",
      "Time per image = 0.58317 seconds\n"
     ]
    }
   ],
   "source": [
    "batch_size = 1\n",
    "\n",
    "start = time.time()\n",
    "print(\"Running the image files embeddings...\")\n",
    "print(\"Total number of images to embed =\", len(new_images_list), \"\\n\")\n",
    "\n",
    "with open(\n",
    "    os.path.join(EMBEDDINGS_DIR, \"list_of_images_new.json\"), \"r\", encoding=\"utf-8\"\n",
    ") as file:\n",
    "    input_data = json.load(file)\n",
    "\n",
    "image_count = len(input_data)\n",
    "processed_count = 0\n",
    "\n",
    "for batch_start in range(0, image_count, batch_size):\n",
    "    batch_end = min(batch_start + batch_size, image_count)\n",
    "    batch_data = input_data[batch_start:batch_end]\n",
    "\n",
    "    for idx, item in enumerate(batch_data, start=batch_start + 1):\n",
    "        imgindex = item[\"idfile\"]\n",
    "        imgfile = item[\"imagefile\"]\n",
    "        item[\"imagevector\"] = image_embedding(imgfile)\n",
    "\n",
    "        if idx % batch_size == 1:\n",
    "            pctdone = round(idx / image_count * 100)\n",
    "            dt = datetime.datetime.today().strftime(\"%d-%b-%Y %H:%M:%S\")\n",
    "            print(\n",
    "                dt,\n",
    "                f\"Number of processed image files = {idx:06} of {image_count:06} | Done: {pctdone}%\",\n",
    "            )\n",
    "\n",
    "    processed_count += len(batch_data)\n",
    "\n",
    "elapsed = time.time() - start\n",
    "print(\"\\nDone\")\n",
    "print(\n",
    "    \"\\nElapsed time: \"\n",
    "    + time.strftime(\n",
    "        \"%H:%M:%S.{}\".format(str(elapsed % 1)[2:])[:15], time.gmtime(elapsed)\n",
    "    )\n",
    ")\n",
    "print(\"Time per image =\", round(elapsed / processed_count, 5), \"seconds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saving the results into a json file...\n",
      "Done. Elapsed time: 0.05 secs\n"
     ]
    }
   ],
   "source": [
    "# Save embeddings to documents.json file\n",
    "start = time.time()\n",
    "\n",
    "print(\"Saving the results into a json file...\")\n",
    "with open(os.path.join(EMBEDDINGS_DIR, \"documents_new.json\"), \"w\") as f:\n",
    "    json.dump(input_data, f)\n",
    "\n",
    "print(\"Done. Elapsed time:\", round(time.time() - start, 2), \"secs\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Size of the documents to load = 2\n"
     ]
    }
   ],
   "source": [
    "with open(os.path.join(EMBEDDINGS_DIR, \"documents_new.json\"), \"r\") as file:\n",
    "    documents = json.load(file)\n",
    "\n",
    "print(\"Size of the documents to load =\", len(documents))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "def loading_documents(documents):\n",
    "    \"\"\"\n",
    "    Loading documents into the Azure Cognitive Search index\n",
    "    \"\"\"\n",
    "    # Upload some documents to the index\n",
    "    print(\"Uploading the documents into the index\", index_name, \"...\")\n",
    "\n",
    "    # Setting the Azure Cognitive Search client\n",
    "    search_client = SearchClient(\n",
    "        endpoint=acs_endpoint,\n",
    "        index_name=index_name,\n",
    "        credential=AzureKeyCredential(acs_key),\n",
    "    )\n",
    "    response = search_client.upload_documents(documents)\n",
    "    print(\n",
    "        f\"\\nDone. Uploaded {len(documents)} documents into the Azure Cognitive Search index.\\n\"\n",
    "    )\n",
    "    return len(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploading the documents into the index azure-fashion-demo ...\n",
      "\n",
      "Done. Uploaded 2 documents into the Azure Cognitive Search index.\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loading_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search index status for: azure-fashion-demo \n",
      "\n",
      "{\n",
      "  \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#Microsoft.Azure.Search.V2021_04_30_Preview.IndexStatistics\",\n",
      "  \"documentCount\": 10221,\n",
      "  \"storageSize\": 155639272\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "document_count, storage_size = index_stats(index_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> This is the new size of the index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Post processing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Azure Cognitive Search Index: azure-fashion-demo \n",
      "\n",
      "{\n",
      "     \"@odata.context\": \"https://azurecogsearcheastussr.search.windows.net/$metadata#indexes/$entity\",\n",
      "     \"@odata.etag\": \"\\\"0x8DB79721D17ED54\\\"\",\n",
      "     \"name\": \"azure-fashion-demo\",\n",
      "     \"defaultScoringProfile\": null,\n",
      "     \"fields\": [\n",
      "          {\n",
      "               \"name\": \"idfile\",\n",
      "               \"type\": \"Edm.String\",\n",
      "               \"searchable\": false,\n",
      "               \"filterable\": false,\n",
      "               \"retrievable\": true,\n",
      "               \"sortable\": false,\n",
      "               \"facetable\": false,\n",
      "               \"key\": true,\n",
      "               \"indexAnalyzer\": null,\n",
      "               \"searchAnalyzer\": null,\n",
      "               \"analyzer\": null,\n",
      "               \"normalizer\": null,\n",
      "               \"synonymMaps\": []\n",
      "          },\n",
      "          {\n",
      "               \"name\": \"imagefile\",\n",
      "               \"type\": \"Edm.String\",\n",
      "               \"searchable\": true,\n",
      "               \"filterable\": false,\n",
      "               \"retrievable\": true,\n",
      "               \"sortable\": false,\n",
      "               \"facetable\": false,\n",
      "               \"key\": false,\n",
      "               \"indexAnalyzer\": null,\n",
      "               \"searchAnalyzer\": null,\n",
      "               \"analyzer\": null,\n",
      "               \"normalizer\": null,\n",
      "               \"synonymMaps\": []\n",
      "          },\n",
      "          {\n",
      "               \"name\": \"imagevector\",\n",
      "               \"type\": \"Collection(Edm.Single)\",\n",
      "               \"searchable\": true,\n",
      "               \"filterable\": false,\n",
      "               \"retrievable\": true,\n",
      "               \"sortable\": false,\n",
      "               \"facetable\": false,\n",
      "               \"key\": false,\n",
      "               \"indexAnalyzer\": null,\n",
      "               \"searchAnalyzer\": null,\n",
      "               \"analyzer\": null,\n",
      "               \"normalizer\": null,\n",
      "               \"synonymMaps\": []\n",
      "          }\n",
      "     ],\n",
      "     \"scoringProfiles\": [],\n",
      "     \"corsOptions\": null,\n",
      "     \"suggesters\": [],\n",
      "     \"analyzers\": [],\n",
      "     \"normalizers\": [],\n",
      "     \"tokenizers\": [],\n",
      "     \"tokenFilters\": [],\n",
      "     \"charFilters\": [],\n",
      "     \"encryptionKey\": null,\n",
      "     \"similarity\": {\n",
      "          \"@odata.type\": \"#Microsoft.Azure.Search.BM25Similarity\",\n",
      "          \"k1\": null,\n",
      "          \"b\": null\n",
      "     },\n",
      "     \"semantic\": {\n",
      "          \"defaultConfiguration\": null,\n",
      "          \"configurations\": [\n",
      "               {\n",
      "                    \"name\": \"my-semantic-config\",\n",
      "                    \"prioritizedFields\": {\n",
      "                         \"titleField\": {\n",
      "                              \"fieldName\": \"idfile\"\n",
      "                         },\n",
      "                         \"prioritizedContentFields\": [],\n",
      "                         \"prioritizedKeywordsFields\": []\n",
      "                    }\n",
      "               }\n",
      "          ]\n",
      "     }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "index_status(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#delete_index(index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
